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Abstract 

The ability to accurately forecast and control inpatient census, and thereby workloads, is a critical 
and longstanding problem in hospital management. The majority of current literature focuses on optimal 
scheduling of inpatients, but largely ignores the process of accurate estimation of the trajectory of patients 
throughout the treatment and recovery process. The result is that current scheduling models are optimizing 
based on inaccurate input data. We developed a Clustering and Scheduling Integrated (CSI) approach 
to capture patient flows through a network of hospital services. CSI functions by clustering patients 
into groups based on similarity of trajectory using a novel Semi-Markov model (SMM)-based clustering 
scheme, as opposed to clustering by patient attributes as in previous literature. Our methodology is 
validated by simulation and then applied to real patient data from a partner hospital where we demonstrate 
that it outperforms a suite of well-established clustering methods. Further, we demonstrate that extant 
optimization methods achieve significantly better results on key hospital performance measures under CSI, 
compared with traditional estimation approaches, increasing elective admissions by 97% and utilization by 
22% compared to 30% and 8% using traditional estimation techniques. From a theoretical standpoint, the 
SMM-clustering is a novel approach applicable to any temporal-spatial stochastic data that is prevalent in 
many industries and application areas. 

Keywords: Clustering, EM algorithm, semi-Markov mixture model, patient flow estimation, stochastic location models. 

1 Introduction 

The mismatch between demand for and supply of medical services caused by high hospital census 

variability has challenged hospital managers for decades. High census variability is a common problem 
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in hospitals and healthcare centers around the world. This problem leads to poor quality of care, 
blocking in hospital wards, increase in inpatient length of stay, and ultimately causes significant in¬ 
crease in cost for both patient and hospital (Helm and Van Oyen 2015). Aiken et al. (2002) studied 
the effect of overloaded nursing staff induced by census variability and showed its effect on mortality 
rate, nurse burnout and job dissatisfaction. A common approach to managing census variability in 
practice involves hospitals procuring excess resources including material, staff, and equipment, leading 
to frequent instances of under-utilization for very expensive resources (Griffin et al. 2012). A better 
approach is to optimize the utilization of available hospital resources based on patient census estima¬ 
tions. This long-standing problem has been termed the Hospital Admission Scheduling and Control 
(HASC) problem, which can be decomposed into two main steps: census modeling (CM) and resource 
scheduling (RS). CM estimates distributional information (typically mean and variance) on patient 
census at the ward level, which is used as an input to the RS to find the optimal resource allocation 
plans and schedules for elective inpatient admissions. 

A significant body of work addresses the RS through a variety of optimization approaches, however 
research on effective census models that integrate with RS is less developed. In this paper, we develop a 
CM method that integrates well with existing RS methods to solve the HASC. We further demonstrate 
the importance of the CM component with respect to the outcomes of the RS optimization; a factor 
that has, to our knowledge, been unaddressed in the current literature. Namely, we show, through 
computational experiments and a case study based on data from our industry partner, that the CM 
method typically employed in RS optimization papers leads to markedly inferior optimization results. 
To conclude this section, we give a short description of the current state of the hospital census forecast¬ 
ing and optimization industry from the experiences of our industry co-author and CEO of a healthcare 
analytics company. Then we discuss challenges posed by the gaps in CM theory that represent a major 
hurdle for this burgeoning industry and discuss how our approach seeks to bridge those gaps. 

1.1 Real-world Challenges in the Hospital Census Forecasting Industry 

Predicting future hospital census levels is a key challenge in the Hospital Admission Scheduling and 

Control (HASC) problem. Without accurate forecasting mechanisms, controlling the variability in 
hospital census becomes difficult and creates a major barrier to low cost, high quality inpatient care. 
These consequences of inadequate forecasting are drawn from real-world experience, where our co¬ 
author has worked with clients and collaborators globally - Asia, Europe and North America. All the 
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hospitals he has worked with experience significant mid-week congestion and high levels of blocking. 

Current methodologies used in hospitals are ineffective to solving the HASC problem. Almost all 
the hospitals have lean teams focused on process improvement and some of the bigger hospitals have 
small analytics teams that use rudimentary models which are ineffective at implementing changes made 
to solve the HASC problem. All the work done at the hospital level are reactive models (predicting 
census levels using historical census means, and applying control by canceling surgeries the day before) 
versus proactive models (implementing control measures in advance). Recently, some hospitals have 
been attempting to shift to proactive measures. This has typically involved increasing capacity and 
lowering utilization, which is cost prohibitive in the long-run. The real solution is to improve the 
forecasting technology. The methods outlined in this paper have proven to be effective on a conceptual 
level with results shared in the later sections. 

Our collaborator, company XYZ (the real name of the company is currently disguised for the review 
process), is one of the first to provide a patient level forecasting tool; i.e. predicting individual flows and 
trajectories of each type of patient entering the hospital. A patient level forecasting and control tool 
is imperative for hospitals to effectively solve the HASC problem. While forecasting is the backbone 
to the solution, XYZ also provides the ability for hospitals to create what-if scenarios by modifying 
admission plans and schedules and to use optimization techniques to customize a dynamic admission 
plan to minimize blocking and surgical cancellations. This type of analysis and decision support is 
only possible through patient-level forecasting, as it requires understanding how patient-by-patient 
modifications to the admission schedule impact hospital census and blocking. This is precisely the 
type of forecasting that we propose in this paper. In fact, workload forecasting is not only useful 
for bed planning purposes, but is key to allocating resources to the various functions of the hospital. 
Most notably, workforce planning for front and back end staff accounts for over 50% of hospital costs. 
Based on the feedback received from XYZ clients, properly allocating staffing reduces various costs, 
like overtime, and improves staff satisfaction. Overall, it is one key in keeping hospitals profitable 
and delivering top quality healthcare. After discussing the various needs of the hospitals, it is clear 
that the key issues in patient flow management, staffing, and scheduling all rely on the critical role of 
forecasting flexibility and accuracy. 

One ongoing challenge for XYZ is the issue of defining Patient Types (PTypes) and estimating 
their probabilistic trajectories over the course of their hospital stay, both of which have a major 
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affect on forecast accuracy. From a computational standpoint, it requires clustering patients into 
groups, where each group represents one type of patient. Currently, XYZ employs various forms 
of regressions to determine factors to group similar patients together into clusters based on patient 
characteristics. Many assumptions must be made to fit data into logical PTypes that are scalable and 
yet give enough information to statistically differentiate patients and enable accurate forecasting. This 
includes applying numerous heuristics and unfortunately, sacrificing the accuracy of the forecast. At 
XYZ, this process is currently done manually for each hospital, often requiring weeks to months of effort 
to properly tailor the PTypes for accurate forecasting. These issues of scalability, repeatability, and 
demonstrated statistical accuracy represent one of the major hurdles for XYZ and other participants 
in the patient-level forecasting space. The methods presented in this paper help solve a key problem 
in parameterizing models for each hospital. Specifically, by clustering patients based on trajectory 
(rather than extrinsic characteristics as in current practice) this paper significantly improves upon 
the currently time consuming and heuristic step of assigning PTypes. Our approach is shown to be 
scalable, statistically rigorous, accurate, and repeatable. This eliminates the time consuming, gestalt 
guess work inherent in current practice and has proven to significantly increase forecast accuracy in 
addition to improving the results from current decision support methods for admission scheduling. 

1.2 Failures of Traditional CM Methods. 

As noted in Fetter et al. (1980) and Helm and Van Oyen (2015), an appropriate HASC model should 
have three characteristics: scalable to hospitals of any size, consider ward interactions, and account 
for patient heterogeneity. Most work in the RS step assumes that patient types are given and uses 
simple methods for estimating patient trajectories, then employs analytical techniques to capture key 
hospital metrics in an optimization model. A patient trajectory is characterized by the transitions 
between wards in a hospital and patient Length of Stay (LOS) in each ward, and can be expressed as 
a stochastic function called a location process that maps time to a set of locations — see Fig. [^for an 
example of two sample path outcomes of a location process. 

While the optimization methods are generalizable, the previous approaches to CM for RS optimiza¬ 
tion lack scalability and are not well suited for capturing patient heterogeneity. These approaches also 
suffer from the limitation of not properly capturing ward interactions, which is shown to be important 
in Sec. |4.2| and[^ where we compare our method with traditional methods that fail to properly account 
for ward interactions. In this paper we address these issues by developing new methods for clustering 
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Figure 1: Example of sample path outcomes of a stochastic location process for patient flow. The x-axis shows 
the time after admission, while the y-axis denotes the ward the patient is in at time t; each step is a change of 
ward for the patient. 

patient location processes based on historical patient flow data. 

As an example of how clustering impacts scalability and patient heterogeneity, consider the follow¬ 
ing. Many traditional approaches to HASC cluster patients by their diagnosis related group (DRG) or 
admitting service. However, in working with a large hospital such as our partner hospital, there can 
be close to one hundred such patient types with quite a few of them being very rare. With such a 
large number of patient types we have found that there is insufficient data to properly estimate patient 
trajectories even with two or more years of historical data. When further including other important 
factors such as gender and age, which have been found to be important in determining a patient’s 
trajectory, data scarcity becomes an even larger problem. Current solutions include combining differ¬ 
ent patient types that are deemed "similar" in order to have sufficient historical data for trajectory 
estimation. This is a clustering problem. Deciding how to combine patient types, however, is a non¬ 
trivial effort considering the entire location process (time and location) must be compared to ensure 
an accurate pairing of two patient types. For example, two patient types may have the same average 
length of stay (LOS) in the hospital but visit different wards. Another example is if two patients visit 
similar locations with similar mean LOS, but one has a skewed LOS distribution and the other does 
not. These factors can all have a significant impact on census forecast accuracy (see Littig and Isken 
2007). Because different hospitals have different methods for categorizing patients (different admitting 
services, DRGs served, etc.), this requires a lengthy and ad-hoc procedure to be performed at each 
new hospital, significantly impacting scalability. For example, our industry co-author has indicated 
that this process of clustering under current methods is unique to each hospital and can take months 
to adequately determine patient types in large hospitals. 

A second problem is that once the patient types have been identified, trajectories are assigned based 
solely on what patient type the patient is identified as. For example, if the patient is a bladder cancer 
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surgery patient their cluster will be bladder cancer surgery. However, other factors that may impact 
the patient’s trajectory and LOS, such as age and gender, cannot be considered after the patient types 
are defined. This approach is only as good as the granularity of each cluster. However, the clusters 
are not defined based on the shape of patients’ location functions, but rather on other factors available 
in the data that are believed to be associated with the shape of the location function, but have not 
been statistically validated. Finally, clusters cannot be too granular or data will be insufficient. This 
phenomenon impacts both the ability to capture patient heterogeneity and to accurately estimate 
patient paths because patients are forced into predefined groups rather than assigned a type that most 
closely matches their projected trajectory. 

In contrast, we develop a new clustering approach that clusters patients directly according to 
similarity of their trajectories (which is what we want to estimate) in a statistically rigorous manner, 
rather than using these ad-hoc proxies (e.g. DRG, age, gender). Specifically, we seek to close the gap 
in the literature by developing new methods for the CM step that provide more effective and scalable 
clustering of patient types, and a better estimation of the patient trajectories for each patient type. 
The proposed model, which we call clustering and scheduling integration (CSI) is scalable, captures 
the interactions between hospital wards, and is capable of handling patient heterogeneity. CSI begins 
with the CM module in which heterogeneous patients are clustered based on the similarity of their 
trajectories. This provides patient types for accurate estimation of patient trajectories and patient 
census distributions at the ward level. Finally, these estimates serve as inputs to the RS module to 
find an optimal hospital resource schedule, which is then shown to outperform the same optimization 
model using traditional CM methods. 

For CM, we propose a novel semi-Markov mixture model (SMM) that integrates the mixture clus¬ 
tering method and semi-Markov models accurately describing stochastic location processes of patient 
trajectory. To the best of our knowledge, this SMM clustering technique has not been proposed before 
in the literature, either for the HASC problem or any other problem. The SMM not only clusters 
patients based on their trajectory, but also provides accurate estimates for the trajectory distribution 
of each group of patients. In the RS module, the output of the CM is fed into an MIP model similar to 
the model proposed by Helm and Van Oyen (2015) to find the optimal resource schedule for hospitals. 

We further show through a case study using real data from a partner hospital that system perfor¬ 
mance is significantly impacted by the quality of the input from the CM step. In fact, using CSI to 
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parametrize the optimization can enable np to a 50% increase in elective admissions while maintaining 
the same level of blocking and internal congestion when compared with the same optimization nsing 
the traditional estimation approach. Similarly, it is possible to have higher ward ntilization compared 
with traditional CM approaches holding all other metrics constant. 

The remainder of the paper is organized as follows. We hrst review the literatnre and position the 
paper in Sec. Next, we develop the new CSI methodology in Sec. in which the SMM clnstering 
method for CM is discnssed in detail, followed by a brief description of the MIP model nsed for RS. 
Then in Sec. we nse simnlation to validate the proposed CSI model in terms of the accnracy of 
estimates and the optimality of solntions. In Sec. we apply onr CSI methodology in a case stndy 
based on historical data from a partner hospital. Finally, in Sec. |^we conclnde the paper and discnss 
fntnre opportunities. 

2 Literature 

Most existing research in the HASC area has focnsed on either CM or RS separately. Little work can be 
fonnd on integrating CM and RS in a cohesive framework. Additionally, existing HASC approaches lack 
at least one of the aforementioned characteristics of an effective HASC model. The aim of this paper 
is to develop an HASC framework that is scalable, acconnts for patient heterogeneity, and considers 
ward interactions throngh effective integration of CM and RS. 

In the HASC literatnre, varions stochastic and deterministic models have been developed for RS. 
Green (2006) and Armony et al. (2011) nsed qnening models to optimize resonrce schednling. Ward 
interactions were not taken into acconnt in either of these papers. Unlike the qnening models, simnlation 
models developed for RS are more flexible and consider the interaction between wards, mostly by nsing 
patient pathways between wards in a hospital. Examples of simnlation-based models inclnde Hancock 
and Walter (Hancock and Walter (1979, 1983)), Griffith et al. (1976), Jacobson et al. (2006), Harper 
and Shahani (2002), Zeltyn et al. (2011), and Konrad et al. (2013). However, simnlation models 
are case-specihc, cannot be easily generalized or scaled, and rely on the same, less effective PType 
and path estimation techniqnes mentioned earlier. Adan et al. (2009), Bekker and Koeleman (2011), 
and Zhang et al. (2009) nsed Mixed Integer Programming (MIP) models for optimal RS. These works, 
however, only focns on either one ward or an isolated feed-forward snbset of the hospital, ignoring ward 
interactions. To address this issne. Helm and Van Oyen (2015) proposed a non-henristic MIP schednling 
model that also nsed patient pathways to model ward interactions of an entire hospital. Althongh the 
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RS portion of the model is scalable and considers ward interactions, it does not properly handle patient 
heterogeneity. Moreover, an empirical method (similar to the traditional method described above) was 
used to estimate the patient census at the ward level, which we show can degrade the value of the 
optimal solution. 

For RS optimization to be maximally effective, an accurate CM is required to estimate patient 
arrival rates, their trajectory through the hospital, and, by combining arrival and trajectory, the 
patient census at both the ward and hospital levels. Regression analysis and time-series modeling have 
been widely used for forecasting inpatient admissions and hospital occupancy (Earnest et al. (2005) and 
Jones et al. (2002)). Abraham et al. (2009) reviewed and compared several models for forecasting daily 
emergency inpatient admissions and occupancy. They found that the admissions are largely random 
and hence non-pr edict able, whereas occupancy can be forecasted using a model combining regression 
and ARIMA, or a seasonal ARIMA, for up to a week ahead. Their model is capable of forecasting 
the overall hospital occupancy, but not the occupancy at the ward level. Consequently, it does not 
account for ward interactions. These approaches are also incapable of capturing what-if scenarios or 
optimization with respect to inpatient admission decisions. Littig and Isken (2007) used occupancy 
flow equations to estimate occupancy at different units or wards of a hospital. They predicted patient 
in- and out-flow using time series and multinomial logistic regression models. They combined these 
predictions and fed them into a set of flow equations to find the net estimate of the number of patients 
in a given ward. However, implementing this model in real time presents a major challenge, as even a 
simple model requires coordination between a variety of real time data sources and the computational 
burden of the method is high, so scaling this model to large hospital would be difficult. 

To model patient trajectory and LOS, Irvine et al. (1994) and Taylor et al. (2000) proposed a 
continuous time Markov model for geriatric patients. This model, however, was developed for few 
wards and lacks scalability. Moreover, the assumption that the LOS at each ward follows the same 
exponential distribution is not often a good model of reality. Faddy and McClean (2000) used Phase- 
type distributions for patient flow modeling. They interpreted phase-type distributions as a mixture of 
components (phases) characterized by the severity of patient’s illness. Marshall and McClean (2003) 
extended this idea and developed a model based on Conditional Phase-type distributions combined with 
a Bayesian Network to be able to include a network of inter-related variables representing causality. In 
phase-type methods, it is assumed that the process begins in the first phase and may either progress 




(a) Phase-type model: Patients can transition in a sequential 
order or leave the system 



(b) Semi-Markov model: Any back and forth transition from any 
ward to any ward is possible 

Figure 2: Illustration of patient trajectory models for a hospital system 


through the phases sequentially or enter an absorbing state (see Fig. 2a). Consequently, these methods 
cannot be extended to capture patient trajectories, where patients revisit a ward several times or 
transition from any ward to any other ward, which is a significant feature according to our data. 
Thomas (1968) and Kao (1972, 1974) proposed a semi-Markov model to predict recovery progress of 
coronary patients. This can model any hospital system with complicated ward interactions in any 


direction (See Fig. 2b). Thus, this model has scalability and can fully model ward interactions but is 
built only for a “homogenous” mix of patients, i.e. coronary. 

Patient heterogeneity is another challenge in CM. To address this challenge. Helm and Van Oyen 
(2015) partitioned patients into homogeneous clusters with respect to their diagnosis using diagnosis 
related groups (DRGs). DRGs have been also used by Fetter et al. (1980) for regional planning. Harper 
(2005) provided a comprehensive review on clustering techniques, including CART, k-means, neural 
network, etc. that use more patient attributes (e.g., age, sex, diagnosis) to find more homogeneous 
clusters. The main assumption of the DRG and attribute-based methods is that patients who belong 
to a cluster, follow a similar trajectory. However, this is not necessarily true. Littig and Isken (2007) 
shows that, patients with similar attributes (e.g., age, sex, diagnosis, etc.) can often have very different 
trajectories. As an example from our own data. Fig. in Sec. shows that although two patients 
shared the same age, sex, and diagnosis, their trajectories were very different. 

In conclusion, the problem of trajectory estimation from a heterogeneous cohort of patients is 
important. To our knowledge, existing literature fails to address at least one or more challenges 
among: scalability, ward interaction, and heterogeneity. In the next section, we develop a methodology 
to address all three challenges and close this gap. 
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Figure 3: Clustering and Scheduling Integrated (CSI) Model overview 


3 Clustering and Scheduling Integrated (CSI) Model for HASC 

Fig. [^provides a high level overview of our methodology. First, historical patient flow data, taken from 
admit-discharge-transfer (ADT) records, is used to group the patients based on their trajectory using a 
semi-Markov Mixture (SMM) model-based clustering approach. The parameters for the semi-Markov 
processes of patient trajectory for each cluster are estimated as a part of the clustering process. These 
stochastic location processes are then combined with a model of the non-stationary patient arrival 
process to form a stochastic process (a Poisson arrival-location model or PALM, see Massey and Whitt 
(1993)) that captures the ward-network census. Estimation of this stochastic network census process 
enables the derivation of three important products for hospital managers: (1) Descriptive: accurate 
census forecasting, (2) What-if scenarios: impact of potential modihcations to admission schedules, 
and (3) Prescriptive: MIP-based admission scheduling optimization. 


3.1 Semi-Markov Mixture (SMM) Clustering for Modeling Patient Trajectories 

When a new patient arrives to the hospital, they are initially assigned a bed in a hospital ward. The 

patient stays at that ward for a stochastic duration and then transfers to another ward or is discharged 
from the hospital. This process repeats if the patient is transferred to another ward of the hospital. 

A general hospital serves a cohort of many different types of patients. Each type of patient requires 
different services during their hospital stay. The hrst task is to identify patient types through clustering. 
As mentioned previously, conventional clustering methods are not applicable to this problem due to 
the fact two patients with the same observed attributes often have different trajectories. 

To manage the heterogeneous mix of patients in a hospital, we develop a semi-Markov mixture 
model for clustering based on patient trajectory rather than predefined groupings based on patient 
attributes. Patients in each cluster are assumed to follow a semi-Markovian trajectory through the 
hospital, which has been validated in the literature (e.g. Hancock et. al. 1983). The SMM produces 
three important products that signihcantly improve the generality and scalability of our method: (1) 
appropriate patient groupings based on trajectory, (2) the optimal number of patient types, and (3) 
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accurate trajectories for each patient type. In Sec. we show that this approach yields more efficient 
patient clusters and more accurate trajectory models than traditional approaches. Moreover, to the 
best of our knowledge, there is no existing approach for developing a semi-Markov mixture model and 
using it for clustering spatial-temporal data. 

3.1.1 SMM Model Structure 

Let /C be the set of unknown patient types, where each patient type’s trajectory follows a unique semi- 
Markov process. The population of patient trajectory data, thus, follows a mixture of an unknown 
number, |/C|, of semi-Markov processes. Each mixture component, which we call a cluster henceforth, 
has a different semi-Markov process distribution. The first step is to determine the set of clusters ( 
~ tC) and estimate their corresponding trajectory distributions. 

Consider a sample of trajectory data for N patients observed over a maximum time period of length 
T. Time is measured by discrete units, for example, a day, quarter of day, hour, to be chosen depending 
on the desired granularity. The set of possible lengths of stay is denoted by T = {1, 2,... ,T}. Let 
U = {^, Z^} denote the set of all states (wards) where U is the set of all transient states and U is the 
set of all absorbing states. The first state when the patient enters the system (hospital) is called the 
initial state and the last state, which is an absorbing state, indicates a patient’s end of stay in the form 
of discharge or death. All the states during the patient’s hospital stay are transient states. The set of 
initial and transient states are the same, as a patient may enter the hospital at any arbitrary location. 

A patient n’s (re G N) trajectory is represented as = ({rei, rei},..., {u£^{n), {n}), where 

ui indicates the visited ward, I'l € T is the length of stay at the corresponding ward, re G ZZ is the 
absorbing state from where the patient leaves the hospital, and subscript I, I = 1,2, indicates 

the sequence of ward visits [state and ward are used synonymously in this paper). is the patient 
re’s path sequence length. This model can capture general network behavior, as there is no restriction 
on the number of times a patient can visit any particular ward. 

We formulate the problem by defining a set of parameters, 0 = {0*^^^}, k G JC. Each is 
comprised of the mixture weight, Tr^^\ and semi-Markov process parameters, for the 

k-ih. mixture. The mixture weight, denotes the probability of a randomly chosen patient belonging 
to cluster k. Letting be a hidden variable representing the cluster index for patient re, then the 
mixture weight can be expressed as, = p@[z = k). Also, ~ 

Of the remaining mixture parameters, u GU, denotes the initial state probability. It 

can be expressed as = p©(rei = u\z = k), the probability of the first state of a patient trajectory 
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being ward u given the patient belongs to clnster k. The matrix = [Puj], u, j G U, is the transition 
probability matrix, where = p&{ui = j\ui-i = u,z = k), the probability of transitioning from 
ward u to j for a patient in clnster k. Finally, u, j £ U, v £ T, is a three-dimensional 

tensor representing the holding mass distribntion, where H^\u) = p@{vi = v\ui = j, = u, z = k) 
gives the probability of a patient in clnster k spending u time units in ward u before transitioning to 
ward j (from u). As P^^\ are probability distribntions, the following hold: 

= 1, Pff = 1, and Y, = 1 (1) 

new jeW i/eT 

Using this parameterization, we represent the conditional probability of any patient n’s trajectory, 
given it is generated by clnster k, in Eq. The hrst part of the eqnation is the initial state prob¬ 
ability. The terms inside the prodnct is the transition probability times the holding time probability 
corresponding to the transition the patient made, and the amonnt of time the patient spent at the 
ward before transitioning. 

1^1 

= (2) 
1=1 

Conseqnently, by considering the probability of belonging to each clnster, k, the probability distribntion 
fnnction (pdf) of the SMM model with JC components is written as 


p(y(")|0) = ^p0(0(") =A:)p0(y(”)|z(”) =fc) 


kGJC 


kGJC 


L<") 


p(fc) TT / p{k) . pik) / O 




(3) 


Given an i.i.d. sample of N patient trajectories, Y = {y("');n = 1,... ,N}, the likelihood fnnction is, 
thns, given by 


N 


N 


P@ 


0(Y)=np(y^”^i®)=nE' 


-(fe) 


n—lk^K 


r(n) 


p(fe)TT/p(fc) .7j(fe) 




(4) 


The parameters of the SMM mixtnre model, 0, can be estimated by maximizing the (log) likelihood 
fnnction in Eq. However, if there is no observed transition between any two states or no instance 
of any particnlar length of stay, the likelihood fnnction becomes zero. To avoid this issne, we nse a 
Bayesian approach that assigns very small prior probabilities to all model parameters, denoted by p(0). 
Thns, according to Bayes rnle, the posterior probability for 0 can be expressed as p(0| Y) = 
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Since p(Y) is independent of 0, it suffices to maximize the non-normalized posterior log-likelihood in 

Eq. [^to obtain the optimal 0*, also known as the maximum a posteriori (MAP) estimates of 0. 

©* = argnmxlog{p(Y|©)p(©)} (5) 

The optimization problem in Eq. does not have a closed-form solution. Further, the non- 

normalized posterior log-likelihood function is non-convex so Eq. cannot be solved using standard 

convex optimization methods. As a result, we develop an iterative expectation-maximization (EM) 

procedure in the following section to obtain the parameter estimates. 

3.1.2 Parameter Estimation via Expectation-Maximization (EM) 

An Expectation-Maximization (EM) algorithm is an effective approach for learning maximum likelihood 
or maximum a posteriori (MAP) estimates, where the likelihood is a function of unobserved latent 
variables (in our case, z). It is an iterative approach comprising of an Expectation (E-step) and 
Maximization (M-step) in each iteration. In the E-step of any iteration p, we obtain a lower bound on 
the objective function by taking its expectation at the current parameter estimate, 0^^^. Then, in the 
M-step, we re-estimate the parameters (update), to obtain 0(P+^), that maximizes the expectation from 
E-step. This procedure results in an increase of the likelihood function with guaranteed convergence 
under some weak regularity conditions that are satisfied in most practical situations (Wu, 1983). The 
specific EM algorithm we develop for the SMM mixture model is as follows: 

E-step 

We find the expected value of the maximum a posteriori function in Eq. with respect to the current 
parameter estimate, denoted by Q(0|0^^^) in Eq. 

g(0|0(?’)) = Eew[log(p(Y|0)p(0)] (6) 

For a simpler expression of the Q function in Eq. we define a membership probability distribution. 
Membership probability, denoted by ^nk, is the probability of observing any patient n’s trajectory, 
generated by cluster k, given parameters 0 (see Eq. |^. 

= k) 

Efc'eyC 7r(''')pQ(yW|2;W = k') 

= Pnk{@)]; n = l,...,N,k€lC (8) 

The Q function, can thus be expressed as. 


^nk{^) 

n(&) 
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Q(0|©(P)) 


EqW [log(p(Y|0)p(0)] 


N 


EE log = A:)J +logp(0) 

n=l keJC 


( 9 ) 


M-step 

In the maximization step, the parameters that maximize the Q function are estimated. The updated 
parameters are, thus. 


0(^^i) = argm^x |(5(0|0(^))| (10) 

To solve Eq. [T^ we will estimate the posterior of the parameters using a Dirichlet prior probability 
distribution for 0, p{&). The Dirichlet distribution is chosen because 1) the parameters of a hrst-order 
semi-Markov mixture are in the form of multinomial probabilities, which are suitably represented by 
Dirichlet distribution, and 2) the conjugate of Dirichlet is also a Dirichlet distribution, thus posterior 
computation is straightforward. 

For any set of multinomial parameters, x = (xi^..., Xm), such that = Ij 0 E < 1, a 

Dirichlet distribution is given by, 

m 


. . . , ? ^m) 


B(a 


n 

i=l 


.ai-l 

i 


( 11 ) 


nT=i na^ 


a constant factor for the Dirichlet 


where Oj’s are hyperparameters for x, and B(a) = m 

r (Ei=i 

probability distribution function. Using the prior probability distributions, assumption of independence 
of parameters, and plugging Eq. [^into Eq. we obtain the posterior distributions. We show in Online 
Appendix A, the posterior distributions are Dirichlet, and how to update parameters to maximize 
Eq. The derived expressions are shown below. 


= 




+ a. 


(k) 




,yk G jc. 
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En=l 

Z^i/'eT 

En=l ^^nfc(0(P))K„j>/(y(^)) + 


, Vtt, j £V(,k £ fC 



Vu ,j £U,v £ T£ tC 


3.1.3 SMM-Clustering Algorithm 

In this section, we detail the specihc algorithm for implementing SMM clnstering (see Algorithm Q 
and discnss key featnres snch as sensitivity to initialization, identihability, compntational complexity, 
and optimization acceleration techniqnes. 

As shown in Algorithm we take the trajectory data and the nnmber of clnsters as inpnts. The 
estimation procednre is initialized by randomly assigning a clnster to each patient, snch that each 
clnster in has at least one patient. The membership probabilities in are initialized as 

nniform (any other random assignment can also be done). The Dirichlet prior hyperparameters, a(.), 
are chosen as a small nnmber and nniform for all parameters (e is a small positive nnmber, taken as 
le —5 in onr experiments). Thereafter, iterative estimation is done, where the membership probabilities 
and the SMM parameters are npdated in each iteration. 

In onr implementation, we set a termination condition so that the algorithm terminates after 
maxiter iterations. Iterations can also be performed nntil a given measnre of convergence. For example, 
convergence can be measnred in terms of either no change in the objective fnnction or hard clnster 
assignments of the trajectories (z) - i.e. the clnsters do not change mnch between iterations. Tracking 
of clnster reassignments (in each iteration) works better than tracking the objective fnnction, as the 
change in latter becomes extremely small after few iterations. Bnt, in practice, having an ripper bonnd 
for the nnmber of iterations (maxiter) is more nsefnl dne to potential identihability issnes. Especially 
when the data size is large, it can take a very long time for clnster reassignments of all trajectories to 
stabilize between iterations, maxiter serves as a reasonable trade-off between compntation time and 
accnrate resnlts, and hence is commonly employed in many clustering implementations. 

The runtime computational complexity of the algorithm is linear in the length of sequences, the 
sample size, the number of clusters, and the number of iterations, i.e. 0(maxlter * KNL), where 
L is average sequence length. This linear complexity makes the implementation fast. Additionally, 
several steps in the algorithm can be vectorized, e.g. parameter normalization, for increased speed. 
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Algorithm 1 SMM-Clustering Algorithm 


Input: Trajectory data, Y = n = 1,..., N}, number of clusters, K. 

Initialize: 


^ rand(l, K),n = 1,..., N. 
n •«— {i/k}nxk 
Oir ^ ^/K; ttp ^ \u\xk''' 


dp i — 


\U\x\U\xk'^ ^ \U\x\U\x\U\xK 

for iter = 1,..., maxiter do 

0 ^ SMMPARAMETERS(n,z) 
n ^ MembershipProb(0,z) 

2(") argmaxfeO„,fe,n = 1,..., A 

end for 

function MembershipProb( 0, z) 

n -tr- OnxK 

for k = 1,... ,K do 
for n = 1, ... ,N do 

Fetch trajectory sequence, = ({ui, i^i},..., }, {u}) 

^n,k ^ 

for ^ = 1,..., do 


> s.t. at least one trajectory assigned to each k 

> Prior hyperparameters 


I ^n,k 

end for 
end for 
end for 


^n,k * P 


(fc) 

ui,ui + 




n 


n,k 




;fc = n = 1,... ,N 


> Normalizing for, ^n,k = 1, Vn = 1,..., A 


return fl 
end function 

function SMMPARAMETERS(fI, z) 

a,r + X]n=l ^n,k'i k = 1, . . . ,K 
7r(fc) ^ 7rW/X;^=i7rfc/ 

P ^ ^UxK] P ^ OuxUxK', H ^ OuxUxTxK 
p^u'’ ^ ap;u €U,k = 1,... ,K 

^ ap - u, u' G U, u ^ u\ k = 1,..., K 

^ aii',u,u' GU,u ^ u' ,v GT,k = \,... ,K 

for n = 1,..., A do 

I Fetch trajectory sequence, y(”) = {{ui,vi },..., {u, v, {u}) 

I Pui ^ Pui + ^n,2(’*) 

for ^ = 1,..., do 


> Normalizing 


zM 


pz<"' , pz'"' Q 


Ul + 

I end for 
end for 

for k = 1,... ,K do 

I pi^^ ^ Pu^ I Yxu'm Vu e fc = 1 ,..., A 

I Kl ^ Pj]/ Pi^ir,yu,jGU,k = l,...,K 

I Hg(p) ^ Hi';](p)/E.,er J GU,pGT,k = l,...,K 

end for 

return 0 = pW, pW, H^; fc = 1,...,A} 

end function 

Output: 0 = {0^^)}, fc = 1,..., A; and cluster assignments z. 


> Normalizing as per Eq. 
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Computation time can be further reduced by: (1) parallelization: since the parameter update equations 
are independent for each cluster, state, and length-of-stay, we can split the computation across many 
computing nodes; and (2) stochastic clustering: using a random subsample of data for parameter 
updating in each iteration. Parallelization requires multiple computing nodes, while the stochastic 
clustering is suitable when the data sample is very large. A higher order Markov clustering extension 
of our proposed model will increase the computational complexity, and therefore may require some or 
all of the above techniques for tractable solution times. 

Similar to most other clustering methods, the SMM-clustering results are sensitive to the initial¬ 
ization. While some experts suggest using prior system knowledge for initialization, such as diagnosis- 
related-groups (DRG), we feel such an initialization may introduce bias into our results; particularly 
because a main motivation for developing this approach was that DRG clustering was not sufficiently 
accurate in practice. We, therefore, recommend random cluster initialization. To avoid potentially 
poor solutions resulting from a particular initialization, we perform multiple runs with different ran¬ 
dom initializations and choose the solution with the highest final objective function. 

3.1.4 Determining the number of clusters 

To determine the appropriate number of clusters, we estimate the SMM model and compute the Q 
function, which is analogous to the likelihood. We then increase the number of clusters, |/C|, by 
one at each iteration. We stop when there is no significant change in the Q function by adding an 
additional cluster (popularly known as the elbow method). To eliminate redundant clusters, we perform 
pairwise hypothesis tests with controlled type-I error for the identified clusters. We use the Ghi-square 
hypothesis test developed by Billingsley (1961a, 1961b) for comparing transition probabilities and 
Kolmogorov-Smirnov for comparing the distributions on the initial state and the holding time. We 
merge any clusters that are found similar by these tests and then perform the tests again in iterative 
fashion until no redundant clusters are detected. A similar approach for removing redundant clusters 
was used by Weiss et. al (1982). 

3.1.5 Trajectory estimation for each cluster 

After parameter estimation, the next step is to estimate the patient trajectory distributions which 
are characterized by the visited wards and length of stay at each ward. Using the selected number 
of clusters and corresponding semi-Markov process estimates from our EM algorithm, we compute 
the probability distribution of patient trajectory, denoted by T{d) = ['yj^\d)]', j ^ h(,k ^ 1C and 
4 = 1,2,..., where 'yj^\d) is the probability that a patient of cluster k is in ward j after d days (we use 
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a day as a time unit for v). This distribution is one of the key inputs to the scheduling optimization. 

To estimate T{d) we use interval transition probabilities, {d)]] u,j £ lJ,k £ JC and 

d = 1 , 2 ,..., where (j)^^j{d) is the probability that a patient in cluster k is in ward j on day d, given 

that the patient entered the hospital in ward u. Recalling that, for a type k patient, H^\d) is the 

(k) 

holding time probability distribution in ward u before transitioning to ward j and ' is the probability 
of transitioning from ward u to j, then (p^j (d) is computed as 


Ak) 


l&U\{u} d'=d+l l&u\{3} d' = l 


( 12 ) 


where 5uj = < 


u= J 


0, w / j 


and (/)2^(0) = < 


u = J 


0 , u^j 


A patient starting in state u can be in state j on 


day d either if the patient stays in ward u for d days before transitioning to ward j (the first term of 


Eq. 12), or u = j and they never left u during the period [0, d] (the second term of Eq. 12), or the 


patient left u at least once and finally reached j by day d (the third term of Eq. 12). Consequently 
7 ^^^ (d) can be expressed as sum-product of all possible initial states to ward j (Eq. 


13). 




(13) 


ueu 


7 from Eq. 13 becomes an input to the scheduling model explained in next section. The semi-Markov 
process estimates, ©, can be used for finding the length-of-stay distribution of each patient type as 
well as the expected mean length of stay in each ward and its variance. Equations to compute these 
are given in the following subsection as they may be useful for other research objectives or purposes. 

3.1.6 Computing Patient length-of-stay distributions 

Length-of-stay in a ward (R). For a patient of type k, we estimate the expected days spent by 
the patient in each ward using the indicator function on the interval transition probability Let 


yd) = 


_(fc) 
V ■ 
uj 


(k) 

u, j £ U] k £ 1C, where denotes the number of days the patient will spend in j given 


(k) 

their initial state was in ward u. The mean of vP' can be computed using Eq. 
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given below. 


_(fc) 

V ■ = 

uj 


E' 

d=l 


‘ST) 


(14) 


(k) 

The second moment of vP \s given by 

PC = PpPS-v 


UJ 


UJ 


U] 


(15) 
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Thus, the variance of the days spent by a patient in a state can be given by 


,(fc) 
u ■ 

uj 


= yu,jeu. 


( 16 ) 


Total hospital length-of-stay (LOS). To get the distribution on LOS for entire hospital stay, 


, u,j £ U] V = 


we calculate the first-passage-time probabilities, denoted by F. (d) 

1,2,...; k £ /C, where f^\d) is the probability that the first passage from state u to j will take 
exactly d days for patients of type k. This event can occur if a patient makes a direct transition from 
u to j on day d, or the patient transitions to any other state I on any day before d and then takes first 
passage from I to j. The second component is recursive and thus takes into account any number of 
transitions between any states (except the absorbing state) to reach state j from u in d days. 


/ST) = Puff'S ETl - d') 




(k) ttF) / fiX) I 


(17) 


(k) - ~ (k) 

Using fuu: where u £ Id, and the initial state probability p we can get the distribution for LOS. 

denotes the first-passage-probability for a patient’s flow from any initial state n to a discharge state u. 


If the initial state is unknown then we use Eq. 18 Otherwise if the initial state is known, say u, then 

(k) 

the distribution is given by itself. 


LW(d) = Y.Pu'’fukd) d=l,2,... 

uFH 


(18) 


3.1.7 Elective and emergency inpatient census model 

In this section, we describe how we integrate the semi-Markov stochastic location processes generated 
from our SMM method with different arrival processes to create a stochastic ward census process. This 
section, as well as Sec. |3.2t presents an elective scheduling optimization approach focused on hospital 
patient throughput (i.e. admission volume) and congestion (e.g. bed block, off-ward placement of 
patients) that is based on the work by Helm and Van Oyen (2015). The purpose of these sections is to 
provide relevant background for possible applications of our CM method in the hospital census forecast¬ 
ing industry as described by our industry co-author. We use the aforementioned optimization approach 
as a proof of concept to test the value of our improved CM method and demonstrate how our CM 
approach integrates seamlessly with existing patient flow optimizations. These sections are, therefore, 
intentionally brief and not intended to present new research in the area of resource optimization. 

There are two broad categories of patients that a hospital serves, elective (EL) and emergency (EM). 


In developing our census model we separate the two because in the optimization in Sec. 3.2 emergency 
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arrivals are considered uncontrollable while the scheduled elective arrivals become the primary decision 
variable. To integrate our SMM clustering and trajectory estimates with the optimization as well as 
the what-if scenarios of interest to the industry, we run the clustering method on EL and EM patients 
separately. Hence each stream, EL and EM, will have its own set of patient types, /C, with their own 
trajectories determined by our SMM. 

As explained in previous sections ( 3.1.1j|3.lT4 1, we cluster the EL patients into homogeneous groups 
with similar trajectories. Trajectory estimates, one for each patient type (cluster), are computed using 


Eqs. 12 and Combining the EL arrival pattern with the semi-Markov trajectory distributions 


for each patient type, discussed in Sec. 3.1.5 creates a stochastic census process that can be used 
to calculate the distribution on patient demand for beds at each ward at any time, t. The exact 
distribution depends on the arrival process. 

For EL admissions we consider a deterministic arrival process, which, when combined with the 
semi-Markovian patient trajectories, yields a Poisson-Binomial distribution on bed demand at fixed 
time point t. The deterministic assumption is an approximation of reality, but has been widely used in 
the literature due to the fact that elective arrivals are controlled and scheduled in advance. Therefore it 
is (1) theoretically possible to achieve close to a deterministic arrival stream, (2) it is highly beneficial 
to patient flow for hospital managers to work toward a deterministic elective arrival stream and should 
be a management priority, (3) deviations from the deterministic arrivals can be incorporated for certain 
distributions and approximated for others — particularly if the variance of the arrival pattern can be 
adequately approximated as a linear function of the mean. 

We model the arrivals of emergency patients using a non-homogeneous Poisson process that varies 
by day of week. Combining these Poisson arrivals with the semi-Markov stochastic location processes 
yields a Poisson-arrival-location model (PALM) of emergency census, (see Massey and Whitt (1993) 
for more details). One feature of a PALM model is that the distribution on demand for beds in any 
ward for fixed t follows a Poisson distribution. 

Having defined the distribution on demand for beds for emergency and elective patients, we now 
briefly describe an optimization model from the literature (Helm and Van Oyen (2015)) that is subse¬ 
quently used to demonstrate the importance of a rigorous patient trajectory estimation procedure. We 
designed our estimation approach to integrate with optimization and what-if scenarios, with this par¬ 
ticular optimization being used as a proof of concept that (1) our method integrates well with current 
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optimization approaches, and ( 2 ) our method signihcantly improves the outcome of the optimization 
when compared with traditional approaches proposed for use with these types of models. 

3.2 Resource Scheduling (RS) MIP model for Elective Admission Scheduling 

The RS model we use as proof of concept integrates both EM and EL census models to capture metrics 

such as blocking and off-ward placement of patients. The two common objectives from the literature 
that we focus on are: 1 ) maximizing the number of elective admissions while constraining congestion 
metrics and 2) minimizing the congestion (e.g. blocking) while maintaining patient throughput. From 
a management perspective, the first objective allows for increased revenue, while the second objective 
provides better access and consequently better outcomes for patients. For ease of reference, we present 
this optimization model in Online Appendix B. 

This concludes the presentation of our CSI approach. In the next section we develop a simulation 
to validate the accuracy of the SMM approach for patient clustering and trajectory estimation and to 
determine the impact of the SMM on optimal solutions to the MIP model. 

4 SMM Validation and Impact on Optimal Scheduling Solutions 

In this section, we perform simulation studies to validate the performance of SMM method. 

4.1 Evaluating the accuracy of the SMM method 

We begin with a detailed analysis of the functionality and performance of our SMM method by per¬ 
forming a simulation study of a hospital system with four transient states (wards), lA = {ui,... ,^ 4 } 
and one absorbing state (discharge/death) U = {D}. We later expand upon this deep-dive to consider 
a variety of other clustering systems. For the initial simulation, flow sequences for 1000 patients were 
generated from four different semi-Markov models (corresponding to four different patient types), de¬ 
noted by ..., As two clusters could be different in P, H, and/or both, we used the following 
setting that covers all possible scenarios. In the data generating model, and have different 
P but same H, and Cs^'^ have same P and different H, while and have different P 
and H. A pictorial representation of the transition probability matrix combined with the initial state 
probability is shown in Fig. In these plots, the darker the color, the higher the probability. The 
component mixture weights, vr, of the four clusters are {0.17,0.33,0.25,0.25} respectively. Addition¬ 
ally, the assignment probabilities in the generating distributions were set less than 0.7 to ensure that 
the simulation output would be similar to that of a general hospital scenario. 

The proposed SMM mixture model was applied to the generated data for various numbers of 
clusters and the Q function was plotted against the number of clusters, \JC\ as shown in Fig. As 
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(a) Data generating transition probabilities 


(b) Estimated cluster’s transition probabilities 


Figure 4: Pictorial representation of transition probabilities as a gray scale heat-map; with higher intensity 
of gray for higher probability. The heat-map for generating and estimated cluster transition probabilities are 
shown side-by-side for visual comparison. 


Table 1: p-values for matched cluster parameters. Higher p-values compared to the significance level indicates 
that the two compared distribntions were same. 


i 

j 

P 

P 

H 

1 

2 

0.99 

0.54 

0.99 

2 

3 

0.99 

0.17 

0.83 

3 

4 

0.99 

0.23 

0.98 

4 

1 

0.99 

0.29 

0.99 


can be seen from the figure, the absolute slope of the Q estimates significantly drops at |/C| =4 with 
estimated tt = {0.169,0.332,0.253, 0.246}, which indicates that the true number of clusters and mixture 
weights were accurately identified by the SMM estimation model. No similar clusters were found by the 


pairwise hypothesis tests discussed in Sec. 3.1.4 To assess the accuracy of the estimated parameters 
7 r(^),p(^),p(*^),for each of the estimated clusters, we compared them with the parameters of 
the data generating model. The pictorial representation of estimated and true probabilities is shown 


in Figures 4a and |4b[ respectively. The high degree of similarity between the plots in these two 
figures implies a highly accurate estimation of initial state and transition probabilities. Additionally, 
we conducted Chi-square and Kolmogorov-Smirnov tests to verify the equality of estimated and true 
parameters. The p-values of these tests reported in Table are all greater than 0.05, indicating that 
the equality of estimated and true parameters (null hypothesis) cannot be rejected, i.e., they are 
statistically the same at a 95% confidence level. In summary, all the results show a clear one-to-one 
mapping between estimated and generating (true) cluster parameters, demonstrating the effectiveness 
of our SMM clustering model at identifying the underlying parameters of the patient flow system. 

Next, we provide deeper insight into the functionality of our SMM method by demonstrating 
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Figure 5: Q function estimates against number of clusters for the simulated data. The improvement in Q 
estimate becomes insignificant after 4 clusters. 


^ Clusters, 

K 

# Patients, 

N 

Path 

lengths’ ^,,(7 

Patients 

paths with 
length > 1 

Run 

Fl-score (%) 

Accuracy (%) Objective 
value 





1 

88.6 

91.6 

208.82 





2 

70.3 

81.4 

201.28 

4 

1000 

43 . 2 , 32.9 

735 

3 

88.9 

91.6 

209.16 





4 

63.8 

78.2 

199.8 





5 

87.9 

90.3 

208.26 






90.8 

92.4 

620.21 





2 

71.4 

85.7 

601.09 

50 

5000 

47 . 5 , 49.6 

4165 

3 

90.2 

91.9 

619.82 





4 

90.7 

92.4 

620.02 





5 

71.8 

85.1 

601.11 






74.6 

86.7 

1110.57 





2 

74.5 

86.8 

1110.21 

100 

10000 

51 . 4 , 53.2 

8504 

3 

88.4 

90.3 

1121.01 





4 

89.7 

91.4 

1121.89 





5 

74.6 

86.5 

1110.78 


Table 2: Clustering results under different initializations. 


the initialization can impact performance and running our method with multiple different random 
initializations helps overcome this challenge. To do so we explore three scenarios with 4, 50 and 100 
clusters respectively. The results are reported in Table The table shows the amount of patient data 
generated in each scenario, and the mean and standard deviation of the length of the simulated patient 
paths. We remove the data where the path is of length 1 (i.e. the patient arrived to the hospital in a 
ward for a single time unit and left), since these patients would not be considered hospital inpatients. 
The sample size remaining is around 75-85% of the original sample (see column 4). 

In this table, we show the clustering results from different runs. Each run has a different random 
initialization. The number of iterations in each run was capped at 50. The fl-score and accuracy 
are shown as clustering performance measures. The results highlight the differences in the clustering 
output for different initializations. As expected, the value of objective function corresponds to the 
accuracy levels - higher the objective value, the higher the accuracy. The bolded rows of the table 
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indicate the solution that was chosen (out of the hve random initializations). 

With respect to robustness, we found that the estimates of SMM parameters in each run (in all 
three scenarios) were found to be statistically similar to the true underlying distributions. This shows 
that, while the hnal cluster assignments may be sensitive to initialization, the output of interest, i.e. 
the semi-Markov parameters, are more robust to initialization. As a precaution, however, we suggest 
multiple random initializations as a means to avoid potentially poor solutions that may be a result of 
a particular initialization. 

Additionally, we show the improvement in objective function and the reduction in cluster reassign¬ 
ment (of trajectories) with each iteration in Fig. The hgure presents the result for the problem with 
50 clusters. The hgure shows that, (1) the convergence of the algorithm as the iterations progress, and 
(2) the objective function reaches a upper bound quickly, but the cluster assignments keep changing, 
although very slightly. The latter observation indicates that multiple cluster solutions gives about the 
same objective function, though the solution quickly becomes relatively stable. This relates to the 
identihability issue and hence our recommendation of setting a maximum on the number of iterations. 


—Likelihood 

-Cluster reassignments 


1500 3 


10 20 30 40 50 

Iteration 


Figure 6: Improvement in objective function and reduction in cluster reassignment as the algorithm conver¬ 
gences. 

4.2 Evaluating the impact of patient flow estimation on scheduling optimization 


As discussed in Sec. 3.1.5 and 3.1.7 the MIP-scheduling model for resource scheduling (RS) uses the 


estimated trajectory distribution of each patient type as an input. In this section, we study the impact 
of a better patient path estimation on patient throughput and ward utilization, which are the outcome 
of RS. To do so, we compare the RS optimization solution using CSI with the true optimal (if the 
actual distribution were known), and with four other common patient clustering methods: k-means 
clustering, DRG-based clustering, Gaussian clustering, and Markov model-based clustering. 

We compare our GSI method with the commonly used attribute-based clustering methods: Di¬ 
agnosis Related Group (DRG)-based clustering and k-means attribute clustering. In these methods. 
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patients are first clustered into groups based on the similarity of their personal and medical attributes. 
While DRG-based clustering uses the patients’ diagnosis (disease) for grouping, k-means uses other 
attributes like age, sex, etc. for a finer grouping. The path distribution for each patient cluster are 
then estimated empirically. Details are given in Online Appendix-C. 

There can be drawbacks to k-means clustering. Sometimes the data density can be non-convex 
or severely unbalanced, which affects the method’s effectiveness. Hence, we also compare our method 
with a Gaussian mixture distribution approach for clustering the patients based on their attributes. 
We also implement and compare our method with a mixture Markov model based clustering method 
(Gadez et ah, 2000). This Markov model clustering is different from SMM due to its assumption that 
the holding time distribution is independent of state transition. 

The patient path distributions, computed from the patient clusters, are supplied to the MIP model 
(the maximum elective admission formulation presented in Sec. 3.2 and Online Appendix B). In Table|^ 
we present the optimization results for the objectives of maximizing throughput and ward utilization 
for the same three scenarios studied in the previous section; cluster sizes K G {4, 50,100}. 



K = 4 


K = 

50 


K = 

100 


Throughput Utilization 

Throughput 

Utilization 

Throughput 

Utilization 

(Optimal) 

85% 

49% 

78% 


34% 

89% 


45% 

CSI 

81% 

49% 

75% 


33% 

88% 


45% 

DRG 

21% 

11% 

24% 


14% 

28% 


19% 

k-means 

24% 

24% 

25% 


16% 

32% 


21% 

Gaussian 

19% 

9% 

18% 


10% 

21% 


10% 

Markov 

58% 

38% 

51% 


21% 

61% 


37% 


Table 3: Percentage increase in service level metrics: throughput from the number of elective patient admission, 
and ward utilization. 


The table contains a row for “optimal”. Here the “optimal” result is drawn by using the known true 
underlying number of patient types and their path distributions. This result serves as the baseline (or, 
the upper bound in this case) to assess the performance of other methods. 

As shown in Table the outcome from GSI is quite close to the optimal. The Markov model is the 
next best method, but falls well short of GSI. This is because, similar to the SMM-clustering used in 
GSI, the Markov model clustering also groups the patients based on similarity in their paths. However, 
the performance is worse than the one from SMM because it assumes a same holding time distribution, 
i.e. the distribution on length-of-stay in a ward (before moving to another) is always the same. This 
means that the Markov model is ignoring a critical feature of ward interactions, i.e. that holding time 
and ward transitions are dependent. 
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The scheduling outcome from empirical patient path distributions (derivation expressions in On¬ 
line Appendix-D) drawn from attribute-based clustering, viz. k-means, DRG and Gaussian, were 
significantly poorer than the optimal. Among them, Gaussian performed the worst because of its 
ineffectiveness in clustering categorical variables present in patient attributes data. 

4.3 Applying SMM-clustering in practice 

In this section, we discuss some of the advantages and disadvantages of the SMM clustering method 
and what types of problems are best suited for applying our method. 

The results of this simulation study show that the proposed GSI yields a schedule in RS that 
is very near the true optimal, and significantly outperforms existing HASG methods. Our SMM 
clustering model outperforms traditional attribute-based clustering methods specifically because it 
takes trajectories into account in the clustering process. Attribute-based clustering, on the other 
hand, relies on an indirect relationship between attributes and patient trajectories rather than directly 
employing trajectories as a clustering approach. This highlights one of the key innovations of our new 
method. Instructively, our SMM-based clustering method also significantly outperforms the Markov 
model-based clustering method, which does consider patient trajectories. This highlights a second 
innovation of our SMM method, which is that we properly consider the interaction between wards. 
That is, our method allows the holding time distribution and ward transitions to be dependent, which 
is ignored in the Markov model-based clustering method. In addition, SMM clustering effectively 
differentiates between patients with different lengths-of-stay at the wards. This more subtle modeling 
difference turns out to have a significant impact on model performance. 

As a result, we find that our SMM-clustering approach is most effective being applied to problems 
with the following characteristics. First, our method performs well in situations where individual 
characteristics available in the data (e.g. age, sex, co-morbidity) are not adequately explanatory of 
trajectories. From a patient flow perspective, this feature is highlighted by the example described by 
Fig-i Glearly there are applications outside of patient flow that share this feature. 

Second, problems with complex and dependent network interactions can cause simpler methods 
to perform poorly, creating significant opportunity for our method to outperform existing clustering 
methods. In particular, non-Markovian networks benefit significantly from relaxing the Markovian 
assumption in the clustering methodology, as ours does. Non-Markovian networks are quite common 
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in healthcare, since a patient’s history has been shown in many contexts to correlate with future 
requirements and outcomes. However, this feature is not unique to patient flow systems. 

Finally, in our context the output of interest is not the clusters themselves, but rather the trajectory 
distributions derived from the clusters. We have shown that our model is quite robust to local optima 
and intializations in terms of the overall trajectory distributions because near the optimal EM solution, 
clusters may continue to change but the overall distributions derived from each cluster remain rela¬ 
tively stable. Our SMM clustering approach provides accurate parameter estimates even if the cluster 
assignment of all the training data is not the global optimal. This also helps mitigate identifiability 
concerns, since different many clustering solutions can generate very similar holding time and tran¬ 
sition distributions, which is the output of interest. Thus applications that rely on the semi-Markov 
distributional output rather than the actual clustering results themselves are best suited for SMM. 

SMM-clustering, however, does rely on Markovian transitions between wards. While this is often 
not strictly true in patient flow systems, much past literature has found this modeling assumption to be 
sufficiently accurate. Performance may suffer, however, if transitions are strongly history dependent, 
thereby making the Markovian assumption a poor representation of reality. In such a situation, the 
trajectory data can be tested for different orders of the Markovian property and the state space may 
be able to be expanded to restore the Markovian property if necessary. 

Another disadvantage is that, due to athe large number of parameters to estimate, SMM-clustering 
requires large amount of data. While, large amounts of data and an underlying Markovian property is 
common for patient flow problems, other methods such as in Ranjan et. al (2016) can be incorporated 
to mitigate data scarcity issues. 

5 Case study on real hospital data 

In this section, we will study the impact of our integrated framework (CSI) on hospital resource 
optimization at a partner hospital, and as a holistic tool for the HASC problem. In particular, we 
focus on validating the trajectory estimation and RS models, as forecasting arrival streams is out of 
the scope of this paper. Hence, we take the arrival stream as given in order to independently evaluate 
the accuracy and impact of trajectory estimation on the HASC problem. 

We use historical data of patient admission and transitions in a hospital with 55 wards including 
surgical, ICU/CCU, medicine, neurology, oncology, obstetrics, etc. Although, physically the hospital 
has more than 55 wards, for simplicity several wards were grouped based on expert prior knowledge 
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about their similarity. This system is a good example of a complex hospital system with general ward 
network structure, transfers and blocking/congestion. 

We obtained one year of data from 2012, with about 11,000 patients who stayed at least one night 
in the hospital. The data set includes the patient flow data, length-of-stay at each ward, and patient 
attribute data, for e.g. age, sex, diagnosis, etc. The ratio of elective and emergency patients in the 
data is almost equal. Patients have an average of 4.1 transfers before leaving the hospital. We compare 
the performance of the CSI model with that of the established clustering and estimation approaches. 

We begin with the CM step by applying SMM-based clustering on patient trajectory data to identify 
patient types. From Fig. we can infer that there are 32 patient types. Again, no redundant clusters 
were found from pairwise hypothesis testing. The trajectory probability distributions for each of 
these patient types are computed using Eq. Simultaneously conventional partition-based clustering 


methods, discussed above in Sec. 4.2 viz. A:-means, DRG and Gaussian clustering, are used to cluster 
patients based on the patient’s attribute data. 

While for DRG, the number of clusters is found from the data (the number of diagnosis types), the 
criteria for Ending the optimal number of clusters with fe-means and Gaussian are rather subjective. 
Therefore, in order to have a fair comparison, we use the same number of clusters as chosen by SMM 
(i.e., 32 clusters). This does not affect the optimization in RS even if we have a few redundant 
clusters, but prevents the risk of suboptimal results due to under-estimation of the number of clusters. 
Therefore, the benehts demonstrated by this case study represent a conservative estimate of the true 
potential benehts when compared to an application to a hospital in the real world. After performing 
the attribute-based clustering, empirical trajectory distributions are estimated for each patient cluster 
using the same approach regardless of clustering method. 



Figure 7: The estimated Q function against increasing number of clusters for the real data in Case Study. It is 
observed that the improvement in Q function is not significant after 32 clusters. 


To verify our claims that two patients with similar attributes may not follow the same trajectory. 
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we observed two patients who were put into the same cluster using the fc-means; they were both male, 
aged between 55-65 years and were diagnosed for heart disease. Their trajectories within hospital are 
shown in Fig. In this hgure, patient^l enters the cardiology ward, transitions to the angiography 
center then to the neurology ward and hnally back to cardiology before leaving the hospital. Patient^2, 
on the other hand, begins their stay in the surgical ward, transitions to the heart clinic, then the ICU, 
then the operating theater, then to the ICU again and hnally back to the surgical ward before being 
discharged. Although the observed attributes for both patients show similar prohles and a heart disease 
diagnosis, the trajectories followed by these patients were very different. Observing their trajectories 
more closely, one can see that patient^2 might have had a severe heart condition, while patient^l had 
a relatively milder heart condition only requiring angiography. 




(a) Patient trajectories from a fc-means duster (b) Patient trajectories from a SMM based cluster 

Figure 8: Trajectories of patients belonging to same cluster. It is observed that patients in SMM based clusters 
follow more similar trajectories than fc-means. 


When employing our SMM-clustering method, we do not see such dissimilarity in patient trajec¬ 


tories within one cluster. As an example. Fig. 8b shows trajectories of a few patients from one of the 
clusters identihed by the SMM approach. Most of the patients in this cluster enter the hospital either 
in surgical or cardiology wards, then transition to the heart clinic, ICU, operating theater and hnally 
cardiology before leaving the hospital. There is one case of patient^6 who entered the hospital in ortho 
and spine center, but then followed similar trajectory of going to heart clinic, ICU, operating theater 
and hnally cardiology. This could be caused by a heart condition developing during an orthopedic 
admission, or possibly due to initial off-ward placement (because the cardiology ward was full). It is 
interesting to see that if we would have used the conventional attribute based clustering this patient 
would have been put into a orthopedic related cluster, while the SMM approach was able to identify 
the patient’s “true” cluster. 

To test the impact of our SMM approach on the RS optimization, we use the maximum elective 
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Throughput 

Utilization 

CSI 

97% 

22% 

DRG 

26% 

11% 

k-means 

30% 

8% 

Gaussian 

22% 

5% 

Markov 

63% 

19% 


Table 4: Comparing the percentage improvement in throughput (elective admissions) and ward utilization 
(workload) for proposed CSI and other traditional methods with respect to the current service and workload 
levels using real hospital data. 


admission formulation given in Sec. 3.2 and Online Appendix B. The goal is to increase the volume of 
patients served (throughput), thereby increasing revenues, while maintaining the same level of service 
and access. The results are shown in Table H] for the CSI and other methods relative to the baseline 
current elective admission schedule of the partner hospital. 

Our CSI method demonstrates a potential increase in elective admissions of 97%, and an increase 
of 22% for ward utilization. Similar to the simulation study, the Markov clustering performs second 
best, with improvements of 63% and 19% for throughput and utilization, respectively. Among the 
other methods, k-means performs the best, yet signihcantly worse than CSI, with improvement of 30% 
and 8%. In practice, attribute partition methods like k-means and DRG are commonly used, though 
they leave much to be desired. 

This case study of a partner hospital demonstrates the importance of an accurate patient clustering 
and trajectory estimation method, as using our CSI not only provides a more accurate forecast of the 
hospital stochastic workload process, but also dramatically improves optimization solutions. Further, 
to the best of our knowledge, our CSI method is the only approach in the extant literature that has 
all the properties required for effective integration with admission scheduling optimization approaches: 
scalable to hospital of any size, considers ward interactions, and accounts for patient heterogeneity. 


6 Conclusion 

The Hospital Admission Scheduling and Control problem is comprised of two main components: census 
modeling and resource scheduling. Previous work on this long-standing problem has considered one or 
the other, but not both. In this paper we develop a new method based on semi-Markov model (SMM) 
based clustering for identifying patient type clusters and estimating cluster trajectory distributions that 
integrates seamlessly with existing scheduling optimization approaches. This integration is proven to be 
extremely important, as optimal solutions using our SMM approach dramatically outperform optimal 
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solutions using the traditional empirical estimation techniques. 

As a theoretical contribution, our novel approach is able to model an entire hospital of any size as 
a coordinated system with a complex, general network of wards and patient transitions between them. 
Further, the model has been shown to be scalable, accounts for ward interactions, and for patient 
heterogeneity, which has not been previously achieved by other methods in the literature. Further, our 
SMM-clustering is a general purpose algorithm applicable to any movement or sequence data having 
spatial and temporal dimension, for example, clickstream data of users on a website or movement of 
cell-phone users among a network of towers. 

Our SMM approach was designed to integrate with RS approaches that provide an optimal control¬ 
lable schedule by patient type for each day of week so this approach can be adopted by any specialty 
or multi-specialty hospital for streamlining their procedures, stabilizing the operating environment for 
their personnel, improving utilization of hospital resources, and enabling cost savings for both patients 
and hospitals. The automated, algorithmic approach to clustering and trajectory estimation is also 
appealing compared to ad-hoc, manual, and heuristic approaches currently employed in practice (which 
can take months to implement and are difficult to validate statistically). 

The SMM-clustering method was validated by simulating data from known generating mixture 
distributions. The SMM estimated clusters and their distributions were found to be statistically the 
same as the generating mixture distributions at a 95% confidence level. Optimizing the elective schedule 
based on inputs from our SMM method achieved outcomes that were very close to the the “true 
optimum” (i.e. given perfect knowledge of patient flow dynamics) while the existing traditional method 
gave performed significantly worse. 

A case study using real hospital data showed that the number of elective admissions could be 
increased by 97% (with the same level of access) compared to only a 30% increase using traditional 
empirical methods (which are comparable to previous optimization improvements reported in the liter¬ 
ature). Moreover, the average ward utilization could be improved by 22% using our approach compared 
with only an 8% improvement using the traditional approach. 

In conclusion, our approach develops a novel method for spatio-temporal clustering and trajectory 
estimation that has a profound impact on an important patient flow problem with the potential to 
improve revenues and/or cost, quality, and access to care. 


31 



References 

[1] Abraham, G., Byrnes, G. B., and Bain, G. A. (2009). Short-term forecasting of emergency inpatient 
flow. IEEE Transactions on Information Technology in Biomedicine, 13(3):380-388. 

[2] Adan, I., Bekkers, J., Dellaert, N., Vissers, J., and Yu, X. (2009). Patient mix optimisation and 
stochastic resource requirements: A case study in cardiothoracic surgery planning. Health care 
management seience, 12(2):129-141. 

[3] Aiken, L. H., Glarke, S. P., Sloane, D. M., Sochalski, J., and Silber, J. H. (2002). Hospital nurse 
staffing and patient mortality, nurse burnout, and job dissatisfaction. Jama, 288(16):1987-1993. 

[4] Armony, M., Israelit, S., Mandelbaum, A., Marmor, Y. N., Tseytlin, Y., and Yom-Tov, G. B. (2015). 
On patient flow in hospitals: A data-based queueing-science perspective, an extended version (ev). 
Technical report. Working paper, http://ie. technion. ac. il/serveng/References/Patient% 20flow% 
20main. pdf. 

[5] Bekker, R. and Koeleman, P. M. (2011). Scheduling admissions and reducing variability in bed 
demand. Health eare management science, 14(3):237-249. 

[6] Billingsley, P. (1960). Statistical inference for markov processes. 

[7] Billingsley, P. (1961). Statistical methods in markov chains. The Annals of Mathematieal Statisties, 
pages 12-40. 

[8] Gadez, I., Heckerman, D., Meek, G., Smyth, P., and White, S. (2000). Visualization of navigation 
patterns on a web site using model-based clustering. In Proceedings of the sixth ACM SIGKDD 
international conferenee on Knowledge diseovery and data mining, pages 280-284. AGM. 

[9] Earnest, A., Ghen, M. I., Ng, D., and Sin, L. Y. (2005). Using autoregressive integrated moving 
average (arima) models to predict and monitor the number of beds occupied during a sars outbreak 
in a tertiary hospital in Singapore. BMC Health Serviees Researeh, 5(1):1. 

[10] Faddy, M. and McGlean, S. (1999). Analysing data on lengths of stay of hospital patients using 
phase-type distributions. Applied Stoehastic Models in Business and Industry, 15(4):311-317. 

[11] Fetter, R. B., Shin, Y., Freeman, J. L., Averill, R. F., and Thompson, J. D. (1980). Gase mix 
definition by diagnosis-related groups. Medieal care, 18(2):i-53. 

[12] Green, L. (2006). Queueing analysis in healthcare. In Patient flow: reducing delay in healtheare 
delivery, pages 281-307. Springer. 

[13] Griffin, J., Xia, S., Peng, S., and Keskinocak, P. (2012). Improving patient flow in an obstetric 
unit. Health eare management seienee, 15(1):1-14. 

[14] Griffith, J. R., Hancock, W. M., and Munson, F. G. (1976). Cost eontrol in hospitals. Health 
Administration Press. 

[15] Hall, R., Belson, D., Murali, P., and Dessouky, M. (2006). Modeling patient flows through the 
healthcare system. In Patient flow: Redueing delay in healtheare delivery, pages 1-44. Springer. 

[16] Hancock, W. M. and Walter, P. F. (1979). The use of computer simulation to develop hospital 
systems. ACM SICSIM Simulation Digest, 10(4):28-32. 

[17] Hancock, W. M. and Walter, P. F. (1983). The" ASCS": Inpatient Admission Seheduling and 
Control System. Health Administration Press. 

[18] Harper, P. R. (2005). A review and comparison of classification algorithms for medical decision 
making. Health Poliey, 71(3):315-331. 


32 



[19] Harper, P. R. and Shahani, A. (2002). Modelling for the planning and management of bed 
capacities in hospitals. Journal of the Operational Researeh Soeiety, 53(1):11-18. 

[20] Helm, J. E. and Van Oyen, M. P. (2014). Design and optimization methods for elective hospital 
admissions. Operations Researeh, 62(6):1265-1282. 

[21] Irvine, V., McClean, S., and Millard, P. (1994). Stochastic models for geriatric in-patient be¬ 
haviour. Mathematieal Medieine and Biology, 11(3):207-216. 

[22] Jacobson, S. H., Hall, S. N., and Swisher, J. R. (2006). Discrete-event simulation of health care 
systems. In Patient flow: Redueing delay in healtheare delivery, pages 211-252. Springer. 

[23] Jones, S. A., Joy, M. P., and Pearson, J. (2002). Forecasting demand of emergency care. Health 
eare management seienee, 5(4):297-305. 

[24] Kao, E. P. (1972). A semi-markov model to predict recovery progress of coronary patients. Health 
Serviees Researeh, 7(3):191. 

[25] Kao, E. P. (1974). Modeling the movement of coronary patients within a hospital by semi-markov 
processes. Operations Researeh, 22(4):683-699. 

[26] Keehan, S., Sisko, A., and Truffer, C. (2007). Expenses for hospital inpatient stays: 2004. Statis- 
tieal Brief, 164. 

[27] Konrad, R., DeSotto, K., Grocela, A., McAuley, P., Wang, J., Lyons, J., and Bruin, M. (2013). 
Modeling the impact of changing patient flow processes in an emergency department: Insights from 
a computer simulation study. Operations Researeh for Health Care, 2(4):66-74. 

[28] Littig, S. J. and Isken, M. W. (2007). Short term hospital occupancy prediction. Health eare 
management seienee, 10(1):47~66. 

[29] Marshall, A. and McClean, S. (2003). Conditional phase-type distributions for modelling patient 
length of stay in hospital. International Transaetions in Operational Researeh, 10(6):565-576. 

[30] Massey, W. A. and Whitt, W. (1993). Networks of infinite-server queues with nonstationary 
poisson input. Queueing Systems, 13(1-3):183-250. 

[31] McLachlan, G. and Krishnan, T. (2007). The EM algorithm and extensions, volume 382. John 
Wiley & Sons. 

[32] Ranjan, C., Ebrahimi, S., and Paynabar, K. (2016). Sequence graph transform (sgt): A feature 
extraction function for sequence data mining. arXiv preprint arXiv:1608.03533. 

[33] Richardson, D. B. et al. (2006). Increase in patient mortality at 10 days associated with emergency 
department overcrowding. Medieal journal of Australia, 184(5):213. 

[34] Ridley, S., Jones, S., Shahani, A., Brampton, W., Nielsen, M., and Rowan, K. (1998). Classification 
treesa possible method for iso-resource grouping in intensive care. Anaesthesia, 53(9):833-840. 

[35] Smallwood, R., Murray, G., Silva, D., Sondik, E., and Klainer, L. (1969). A medical service 
requirements model for health system design. Proeeedings of the IEEE, 57(11):1880-1887. 

[36] Taylor, G., McClean, S., and Millard, P. (2000). Stochastic models of geriatric patient bed occu¬ 
pancy behaviour. Journal of the Royal Statistieal Soeiety: Series A (Statisties in Soeiety), 163(1):39- 
48. 

[37] Thomas, W. H. (1968). A model for predicting recovery progress of coronary patients. Health 
Serviees Researeh, 3(3):185. 


33 



[38] Weiss, E. N., Cohen, M. A., and Hershey, J. C. (1982). An iterative estimation and validation pro¬ 
cedure for specification of semi-markov models with application to hospital patient flow. Operations 
Research, 30(6):1082-1104. 

[39] Wu, C. J. (1983). On the convergence properties of the em algorithm. The Annals of statistics, 
pages 95-103. 

[40] Zeltyn, S., Marmor, Y. N., Mandelbaum, A., Carmeli, B., Greenshpan, O., Mesika, Y., 
Wasserkrug, S., Vortman, P., Shtub, A., Lauterman, T., et al. (2011). Simulation-based models 
of emergency departments:: Operational, tactical, and strategic staffing. ACM Transactions on 
Modeling and Computer Simulation (TOMACS), 21(4):24. 

[41] Zhang, B., Murali, P., Dessouky, M., and Belson, D. (2009). A mixed integer programming 
approach for allocating operating room capacity. Journal of the Operational Research Society, 
60(5):663-673. 


34 



The Impact of Estimation: A New Method for Clustering and 
Trajectory Estimation in Patient Elow Modeling 


Appendices 

Appendix A: Derivation of SMM-clustering update expressions for EM algorithm 


In this appendix, we present the derivation of parameter update expressions for the EM algorithm in 
Sec. 3.1.2[ As mentioned in the section, we have to obtain the posterior distributions of the parameters 
to find their optimal estimates that maximizes Eq. 

We use Dirichlet prior distributions, given in Eq. [m for the parameters. The Dirichlet hyperparam¬ 
eters for parameters in 0 = ^ ^ are denoted by {ai^\a'p\a'p\a^^},k G /C, 

respectively. For each model parameter, the hyperparameters can be set to equal values, if there is no 
specific prior knowledge (non-informative prior). Besides, we assume the parameters are independent. 
Using it with the conditions on probability sums equal to 1 in Eq. and parameter independence 
assumptions gives the following expressions for prior probabilities. 
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Furthermore, using the parameter independence, the prior distribution for 0 is. 


(19) 


p(0) = p(7r)p(p)p(P)p(H) 


( 20 ) 


Plugging Eq. 20 and Eq. |^into Eq. and using the hyperparameters mentioned in Sec. 3.1.2 
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get. 
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where, K{x,y) is an indicator fnnction eqnal to 1 if x = y, Rujiy^^'^) is the connt fnnction eqnal to 
the nnmber of times transition was made from state u to j in trajectory and /Cnj>(y^”^) is the 

connt function equal to the number of times transition was made from state u to j, in trajectory y^"\ 
when length of stay at state u was i' time units. 

As shown in Eq. the posteriors of the model parameters are Dirichlet distributions with up- 
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Appendix B: Elective Scheduling Optimization MIP Formulation 

In this appendix we present, an optimization model from the literature (Helm and Van Oyen (2015)). 
that is used to demonstrate the importance of a rigorous patient trajectory estimation procedure. 
We designed our estimation approach to integrate with optimization and what-if scenarios, with this 
particular optimization being used as a proof of concept that (1) our method integrates well with current 
optimization approaches, and (2) our method signihcantly improves the outcome of the optimization 
when compared with traditional approaches proposed for use with these types of models. We begin 
by describing the model parameters and then present the optimization model with brief description of 
the objective and constraints. For a more detailed description of the optimization approach we refer 
the readers to Helm and Van Oyen (2015). 
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Sets 

/C set of all patient types 

lA set of hospital wards 


Hospital parameters 


c 

T] 

b 


(k) 

-ik) 

R 


vector of ward capacities 

vector of total cancellations attributed for each ward 
limit on the average number of blockages per week 

vector of limit on the average number of off-unit patients allowed for each ward 
current elective admission volume of type k patients on day d 
maximum number of elective admissions of type k allowed on day d 
reward vector where Rk is the reward for admitting patient of type k 


Patient trajectory and census distributions 


'yu\di) 

Pu,di^) 

Pd{n) 


probability that an elective patient of type k requires a bed in ward u, di days after 
admission (trajectory distribution) 

probability that there are n emergency patients demanding a bed in ward u on day d 
probability that there are n emergency patients demanding a bed in the hospital on day d 


Decision Variables 

(k) 

' number of type k G 1C patients scheduled on day d 
5d,n number of blockages if there are n emergency patients in the hospital on day d 

number of ward u off-unit patients on day d if there are n emergency patients in ward u 

The patient trajectory and census distribution parameters are computed offline as explained earlier 
in this section. Since the PALM model for emergency patient bed demand is exogenous to the decision 
variable, this too is calculated off-line, with the results captured as Pu,d{n) and pd{n). We consider 
a weekly planning horizon that repeats itself every week, generating a cyclostationary system that 
varies by day of week. The objective is to maximize the throughput of the sum of elective patient 
admissions (over the planning horizon) of each type weighted by a "reward" vector R (1 denotes a 
column vector of all ones). The reward vector gives flexibility to allow the model to treat one patient 
type differently from another, for example, the model can prioritize one patient type over another with 
respect to patient criticality, projected revenue generated by the admission, or other strategic priority. 
The formulation is as follows: 


maxR • \k • 1 (22) 

0,(5,,5 


s.t. 


38 



Sdi,n >n- + di- d 2 )), 

u£U d 2 =l fcg/C n'=0 

di = 1,...,7; n = 1,2,... 


( 23 ) 


7 oo 


EE Pd{n)Sd,n < b 


d=l n=0 


d = 1,..., 7; n = 1, 2,... 


^cZ,nH-l — ^d,n 

7 oo 7 oo 

> « + X] X] ■ X] + di - d2) - - Vu'^Yj 

d 2 = l k^K, n'=0 d=0n'=0 


(24) 

(25) 

(26) 




n=0 


’^d,n+l 

> 

Od,n 

7 


7 

E'f?’ 

> 

EV' 

d=l 


d=i 


< 

_(fc) 

k>d 


yu £U; di = 1,... ,7; n = 1, 2,... 
yu £ Id; d = 1,... ,7 

d = 1,... ,7; n = 1, 2,... 

yk£)C 

yk £ JC; d=l,...,7 


(27) 

(28) 

(29) 

(30) 




The constraints of this model are primarily for constraining the blockages faced by the patients, 
limiting off-ward placement, and respecting the hospital resonrce limits. Since the pnrpose of this work 
is to demonstrate how CM can be improved by developing methods that integrate with optimization, 
and not to provide new optimization methods, we briefly describe the optimization presented here. 
Greater detail regarding this approach can be fonnd in Helm and Van Oyen (2015). Constraints 23 
calcnlate the nnmber of blocked patients at the hospital level if n emergency patients are in the hospital 
on day di. This sets the helper variable, 5d^n which is snbseqnently nsed to calcnlate expected blockages 
according to the distribntion on the emergency patient bed demand stochastic process in the left hand 


side (LHS) of Constraints 24 by mnltiplying the indicator of whether the patient wonld be blocked 
by the probability of seeing n emergency patients in the hospital. The right hand side constrains the 
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expected blocked patients to be less than some target level, b, which can be chosen by management. 


Constraint 25 is a cnt that is added to the formnlation that significantly improves model solntion speed. 
Similar to the constraints (Eq. 23||^) for blockages, we have constraints in Eq. ^6 28 for approxi¬ 


mating and limiting expected off-nnit censns. An additional term in Eq. 26, rju Yld=o ^d,n' ■Pd(n'), 

snbtracted from the otherwise expected nnmber of off-nnit censns gives patients who were blocked and 
not able to be admitted to the hospital in the first place. 

Constraints in Eq. ensnres that the proper mix of patients is respected. Specifically, it ensnres 
that each patient type has at least as many admissions each week as they did prior to optimization. 
Constraints ensnre that the model respects the hospital resonrce capacity for a day. For example, 
hospitals freqnently avoid admitting elective patients on Snndays, which conld be achieved by setting 

P-slnday = 


Appendix C: Assigning attributes to patients in simulation study 

Here we elaborate on synthesis patient attribntes for simnlation stndy in Sec. For brevity, we show 
it for A = 4. In the data generation step for this problem, after patient trajectories were generated 
from fonr semi-Markov processes, three attribntes, viz. age, gender and diagnosis (with three diagnoses 
being Dl, D2, D3), were assigned to the patients snch that any attribnte triplet has the possibility 
of being in any clnster; e.g. a 30 year old female with diagnosis Dl conld potentially be from any 
of the fonr clnsters. This resembles real-world challenges involved in patient trajectory estimation 
by simnlating the fact that two patients with the same attribntes may have different trajectories; i.e. 
the attribntes are not adeqnately captnring patient heterogeneity. In practice, patient attribntes are 
capable of captnring some of the patient heterogeneity so we ensnre that clnsters contain patients 
whose attribntes are mostly similar by adhering to a near-Pareto principle (see the three attribnte 
generating tables in Table [^. That is, clnsters are composed mostly of similar patient attribntes with 
a mix of patients who have different attribntes. This distribntion of attribntes is designed to be fair 
to the traditional approach and captnre the reality that attribntes do have differentiating power, bnt 
cannot completely specify a patients likely trajectory. 

In Sec. we assign physical attribntes to patient for onr simnlation stndy. We perform a conserva¬ 
tive assignment, in favor of traditional patient clnstering method, by giving higher chance of patients 
within a true clnster having similar attribntes. Tablebelow shows the generating distribntions for the 
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Sex 

Diagnosis 

Cluster 

Age 

Cluster 

M 

F 

Cluster 

D1 

D2 

D3 

1 

iV(20,3) 

1 

80% 

20% 

1 

70% 

20% 

10% 

2 

iV(30,3) 

2 

20% 

80% 

2 

20% 

70% 

10% 

3 

iV(40,3) 

3 

70% 

30% 

3 

10% 

20% 

70% 

4 

iV(50,3) 

4 

30% 

70% 

4 

80% 

10% 

10% 


(a) Normal Distribution for (b) Uniform distribution for (c) Uniform distribution for patient di¬ 
patient age patient sex within clusters agnosis 


Table 5: Generating distributions for patient attributes within true clusters 


patient attributes within each cluster. As shown in the table, age is taken from a normal distribution 


with different means (Table 5a), sex and diagnosis (Table 5b 5c) are taken according to a Bernoulli 
random variable with different success probabilities. The distribution parameters are chosen such that 
there is high attribute similarity (dissimilarity) between patients within (between) clusters. 


Appendix D: Empirical Estimation of Patient Trajectories 

Once the clusters have been formed using k-means clustering, the trajectory distribution is computed 
for each cluster independently by normalizing the frequency of transitions of patients between wards 
as follows: 

1 . k^JCandu^U. 

E„'Gw[E„=iK(yl h«')J 
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